In this lesson, students will learn how to work with spatial data, or data with geographic information. There are different packages that in R that handle spatial data. In these packages, spatial data is stored as a special class (ex. character, number, etc.) which stores the geographic information.
To start, let’s plot North Carolina with county lines.
ggplot2 contains some mapping data. Here, we will use the
function map_data to get the USA map data, and then filter
by “north carolina”.
library(ggplot2)
library(dplyr)
map <- map_data("county")
nc <- filter(map, region =="north carolina")
head(nc)
## long lat group order region subregion
## 1 -79.53800 35.84424 1857 54915 north carolina alamance
## 2 -79.54372 35.89008 1857 54916 north carolina alamance
## 3 -79.53800 35.98175 1857 54917 north carolina alamance
## 4 -79.52081 36.23385 1857 54918 north carolina alamance
## 5 -79.26298 36.23385 1857 54919 north carolina alamance
## 6 -79.27444 35.90726 1857 54920 north carolina alamance
In this data, there are points that will be connected into polygons to plot each county based on the “group”. In the aesthetics, we set x equal to the longitude and y equal to the latitude.
ggplot(data = nc, aes(x = long, y = lat, group = group)) +
geom_polygon()
You can use the edit the plot in the same way that you did other plots
in ggplot. For example, I’ll color each of the the counties a different
color and change the theme.
ggplot(data = nc, aes(x = long, y = lat, group = group)) +
geom_polygon(aes( fill = subregion))+
theme_minimal()+
theme(legend.position = 'none')
With plotting spatial data, you do have to consider the coordinate system, which determines how the spherical data from the earth is plotted flattened as a 2-D map. There are different coordinate systems that you can use. The default is cartesian coordinates, where x and y coordinates are 1:1.
See what happens when you change the coordinate systems using the following:
coord_map(): Mercator projectioncoord_map("conic", lat0 = 30)coord_map('gilbert')
1.coord_map('orthographic')In the last example, we plotted a polygon simply by connecting points
with lines in ggplot. However, other data sources have
polygon spatial data stored differently. In the next example, we will
plot using a shape (.shp) file in the package sf. We will
plot the major rivers or streams in Wake County. The file is already in
your Posit Cloud workspace. To load in the shape file using
sf, you will use the function st_read
library(sf)
streams <- st_read('data/Major_Rivers/')
You can already see that this is different than a regular csv file, with infomration about the geographic data shown as you import the data. Specifically, you can see that the coordinates are XY, a “bounding box” or dimensions, and a coordinate system (Projected CRS).
You can also see in the data a column for the geometry. This determines the shape of each stream.
head(streams)
## Simple feature collection with 6 features and 5 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 615889.6 ymin: 218627.2 xmax: 616459.2 ymax: 219804.9
## Projected CRS: NAD83 / North Carolina
## OBJECTID FTYPE FCODE NAME RCH_CODE
## 1 1 STREAM/RIVER 46000 Reedy Branch 03030002002226
## 2 2 STREAM/RIVER 46000 Reedy Branch 03030002002226
## 3 3 STREAM/RIVER 46000 Reedy Branch 03030002002226
## 4 4 STREAM/RIVER 46000 Beaver Creek 03030002000011
## 5 5 CONNECTOR 33400 Beaver Creek 03030002000011
## 6 6 CONNECTOR 33400 Beaver Creek 03030002000012
## geometry
## 1 LINESTRING (616459.2 219804...
## 2 LINESTRING (616355.1 219793...
## 3 LINESTRING (616225.6 219798...
## 4 LINESTRING (615944.7 218709...
## 5 LINESTRING (615895.2 218702...
## 6 LINESTRING (616331.7 218627...
streams$geometry[1]
## Geometry set for 1 feature
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 616369 ymin: 219792.1 xmax: 616459.2 ymax: 219804.9
## Projected CRS: NAD83 / North Carolina
## LINESTRING (616459.2 219804.9, 616452.3 219801....
You can use the functions st_bbox to find the boundaries
of the file and st_crs to view the coordinate system
information.
st_bbox(streams)
## xmin ymin xmax ymax
## 612248.0 197064.4 677032.4 257607.0
st_crs(streams)
## Coordinate Reference System:
## User input: NAD83 / North Carolina
## wkt:
## PROJCRS["NAD83 / North Carolina",
## BASEGEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]],
## CONVERSION["SPCS83 North Carolina zone (meters)",
## METHOD["Lambert Conic Conformal (2SP)",
## ID["EPSG",9802]],
## PARAMETER["Latitude of false origin",33.75,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8821]],
## PARAMETER["Longitude of false origin",-79,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8822]],
## PARAMETER["Latitude of 1st standard parallel",36.1666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8823]],
## PARAMETER["Latitude of 2nd standard parallel",34.3333333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8824]],
## PARAMETER["Easting at false origin",609601.22,
## LENGTHUNIT["metre",1],
## ID["EPSG",8826]],
## PARAMETER["Northing at false origin",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8827]]],
## CS[Cartesian,2],
## AXIS["easting (X)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["northing (Y)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["Engineering survey, topographic mapping."],
## AREA["United States (USA) - North Carolina - counties of Alamance; Alexander; Alleghany; Anson; Ashe; Avery; Beaufort; Bertie; Bladen; Brunswick; Buncombe; Burke; Cabarrus; Caldwell; Camden; Carteret; Caswell; Catawba; Chatham; Cherokee; Chowan; Clay; Cleveland; Columbus; Craven; Cumberland; Currituck; Dare; Davidson; Davie; Duplin; Durham; Edgecombe; Forsyth; Franklin; Gaston; Gates; Graham; Granville; Greene; Guilford; Halifax; Harnett; Haywood; Henderson; Hertford; Hoke; Hyde; Iredell; Jackson; Johnston; Jones; Lee; Lenoir; Lincoln; Macon; Madison; Martin; McDowell; Mecklenburg; Mitchell; Montgomery; Moore; Nash; New Hanover; Northampton; Onslow; Orange; Pamlico; Pasquotank; Pender; Perquimans; Person; Pitt; Polk; Randolph; Richmond; Robeson; Rockingham; Rowan; Rutherford; Sampson; Scotland; Stanly; Stokes; Surry; Swain; Transylvania; Tyrrell; Union; Vance; Wake; Warren; Washington; Watauga; Wayne; Wilkes; Wilson; Yadkin; Yancey."],
## BBOX[33.83,-84.33,36.59,-75.38]],
## ID["EPSG",32119]]
To plot the streams, you can use the layer geom_sf. You
do not need to specific any x and y values here. Also, because there is
already a coordinate system associated with the file, ggplot will
automatically plot in the given coordinate system.
ggplot() +
geom_sf(data = streams, color = 'blue')
Although the data has associated geometries, you can still manipulate
the data frame, for example, using dplyr.
We can use distinct to find each stream in the data or
can filter for specific streams.
streams |>
distinct(NAME)
## NAME
## 1 Reedy Branch
## 2 Beaver Creek
## 3 Big Branch
## 4 Middle Creek
## 5 Little Beaver Creek
## 6 Thomas Creek
## 7 Terrible Creek
## 8 Kenneth Creek
## 9 Mills Branch
## 10 Hector Creek
## 11 Little White Oak Creek
## 12 White Oak Creek
## 13 Little Branch
## 14 Buckhorn Creek
## 15 Haleys Branch
## 16 Kit Creek
## 17 Buffalo Creek
## 18 Little River
## 19 Marks Creek
## 20 Utley Creek
## 21 Basal Creek
## 22 Neills Creek
## 23 Black Creek
## 24 Little Black Creek
## 25 Beaverdam Creek
## 26 Neuse River
## 27 Horse Creek
## 28 Mud Branch
## 29 Lowery Creek
## 30 New Light Creek
## 31 Water Fork
## 32 Little Beaverdam Creek
## 33 Lower Barton Creek
## 34 Upper Barton Creek
## 35 Honeycutt Creek
## 36 Pierce Creek
## 37 Lick Creek
## 38 Guffy Branch
## 39 Little Creek
## 40 Swift Creek
## 41 Mahlers Creek
## 42 Whiteoak Creek
## 43 Poplar Creek
## 44 Cedar Creek
## 45 Panther Branch
## 46 Juniper Branch
## 47 Sanford Creek
## 48 Harris Creek
## 49 Dutchmans Branch
## 50 Richland Creek
## 51 Austin Creek
## 52 Perry Creek
## 53 Smith Creek
## 54 Hominy Creek
## 55 Cedar Fork
## 56 Moccasin Creek
## 57 Fowlers Mill Creek
## 58 Crabtree Creek
## 59 Turkey Creek
## 60 Coles Branch
## 61 Walnut Creek
## 62 Camp Branch
## 63 Rocky Branch
## 64 Lynn Branch
## 65 Speight Branch
## 66 Long Branch
## 67 Snipes Creek
## 68 Nancy Branch
## 69 Morris Branch
## 70 Bachelor Branch
## 71 Jack Branch
## 72 Reedy Creek
## 73 House Creek
## 74 Southwest Prong Beaverdam Creek
## 75 Panther Creek
## 76 Brier Creek
## 77 Pigeon House Branch
## 78 Wildcat Branch
## 79 Hare Snipe Creek
## 80 Toms Creek
## 81 Northeast Creek
## 82 Indian Creek
## 83 Pots Branch
## 84 Rocky Creek
## 85 Marsh Creek
## 86 Bridges Branch
## 87 Cemetery Branch
## 88 Mango Creek
## 89 Mingo Creek
## 90 Cabtree Creek
## 91 UTLEY CREEK
## 92 Crabtree Branch
streams |>
filter(NAME == 'Cemetery Branch')
## Simple feature collection with 33 features and 5 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 643169 ymin: 225367.5 xmax: 643744.3 ymax: 227726.8
## Projected CRS: NAD83 / North Carolina
## First 10 features:
## OBJECTID FTYPE FCODE NAME RCH_CODE
## 1 2129 CONNECTOR 33400 Cemetery Branch 03020201002398
## 2 2130 CONNECTOR 33400 Cemetery Branch 03020201002398
## 3 2132 CONNECTOR 33400 Cemetery Branch 03020201002398
## 4 2133 CONNECTOR 33400 Cemetery Branch 03020201002398
## 5 2135 STREAM/RIVER 46000 Cemetery Branch 03020201002398
## 6 2137 CONNECTOR 33400 Cemetery Branch 03020201002398
## 7 2138 CONNECTOR 33400 Cemetery Branch 03020201002398
## 8 2139 CONNECTOR 33400 Cemetery Branch 03020201002398
## 9 2140 CONNECTOR 33400 Cemetery Branch 03020201002398
## 10 2577 STREAM/RIVER 46000 Cemetery Branch 03020201002398
## geometry
## 1 LINESTRING (643609.2 227309...
## 2 LINESTRING (643557.5 227237...
## 3 LINESTRING (643521.4 227042...
## 4 LINESTRING (643403.4 226693...
## 5 LINESTRING (643397 226439.2...
## 6 LINESTRING (643294.6 226121...
## 7 LINESTRING (643214.5 225898...
## 8 LINESTRING (643170.5 225626...
## 9 LINESTRING (643240.6 225510...
## 10 LINESTRING (643605 227283.4...
For fun, lets plot just Cemetery Branch over the other streams in a different color to highlight it.
cemetery_branch<- filter(streams, NAME == 'Cemetery Branch')
ggplot() +
geom_sf(data = streams, color ='cornflowerblue') +
geom_sf(data = cemetery_branch, color = 'red') +
theme_minimal()
In addition, you can layer multiple shape files. In your data folder, you also have a shape file for the Wake County Line.
Exercise Plot the Wake County line with the streams.
It is pretty straightforward to plot points on maps since you can plot based on the x and y coordinates. You can plot points in the same way that you have done before with x and y values, but with gps coordinates, the x is longitude and the y is latitude.
You can also use the function st_point in the
sf package for geographic points.
park <- st_point(c(-78.6247498208728, 35.7947396684644))
To add the point to the plot, again use geom_sf since
you have make stored the point as an sf object. What
happens when you add the point to the previous plot?
When plotting different layers, you have to make sure the coordinate systems are the same.
What happened here?
You can see that the units for each are still not the same.
st_bbox(park)
## xmin ymin xmax ymax
## -78.62475 35.79474 -78.62475 35.79474
st_bbox(streams)
## xmin ymin xmax ymax
## 612248.0 197064.4 677032.4 257607.0
We can change the coordinate systems using st_transform. We will do
this for each of the objects to make sure they are on the same system.
Most coordinate systems have a unique EPSG number. Here, we will use EPSG 4269, which is the same North
American Datum 1983 system as before, but by transforming it, it will
turn units to GPS coordinates. We also have to set the crs when making
the point in for the point by setting the crs with
st_stc.
streams <- st_transform(streams, crs = 4269)
park <- st_sfc(park,crs =4269)
Now, plot again.
ggplot() +
geom_sf(data = streams, color = 'blue') +
geom_sf(data = park, color = 'red') +
theme_minimal()
This previous way of plotted points used a spatial data object for
the points. However, it can also be done by simply using
geom_point()
ggplot() +
geom_sf(data = streams, color = 'blue')+
geom_point(aes(x = -78.62475, y = 35.79474), color = 'red') +
theme_minimal()
We are only scratching the surface of all you can do with spatial
data. For example, you may want to work with multiple shape files or
different types of spatial data. There are many functions in
sf to work with multiple files including: - st_intersection
- st_join - st_crop - st_mask - st_union
As you work on different projects, you can investigate these more.
One example, however, is to crop one shape file based on another. Let’s
say we only want to show the streams that are in Raleigh, since that is
what we are using for our project. We can use st_crop to
crop the streams map based on a shape file with the polygon for the City
of Raleigh.
towns <- st_read('data/Townships-shp/')
The Townships data has polygones for every township in Wake County
ggplot()+
geom_sf(data= towns)
We will just get the township for Raleigh
raleigh <- filter(towns, NAME == 'RALEIGH')
Now, we can use st_crop to crop the streams based on the
shape file for Raleigh. Again, we have to make the coordinates the
same.
raleigh <- st_transform(raleigh, crs = 4269)
raleigh_stream <- st_crop(streams, raleigh)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
ggplot(data = raleigh_stream) +
geom_sf()
In our last example with shape files, we will go over displaying data in polygons. In the data frames that include the geometries, you can also have associated data with each polygon. This data can be plotted with the polygon, for example as color.
Here, we have a shape file for the census tract in Wake County and their associated CDC social vulnerability index.
Again, we will load in the data with st_read.
library(sf)
svi <- st_read('data/CDC_SVI/')
As an example, let’s plot the proportion of the population without a high school diploma for each tract.
ggplot(data = svi) +
geom_sf(aes(fill = EP_NOHSDP))
The other main type of spatial data is raster data, where every point
or pixel contains information. In this example, we will work with
climate data from PRISM with the University of Oregon. We can download
the data with the prism package. This has been done for you
and the data is in your folder.
We will also use the terra package to work with the
raster files. We will first use the rast function to load
in a raster object with the average temperature for 2023.
library(terra)
temp2023 <-rast('data/tmean_2023.grd')
Now, lets look at the file. You can see it has similar format to the spatial data from the shape files, with projection, units, etc.
temp2023
## class : SpatRaster
## dimensions : 621, 1405, 1 (nrow, ncol, nlyr)
## resolution : 0.04166667, 0.04166667 (x, y)
## extent : -125.0208, -66.47917, 24.0625, 49.9375 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=NAD83 +no_defs
## source : tmean_2023.grd
## name : PRISM_tmean_provisional_4kmM3_2023_bil
## min value : -4.4730
## max value : 27.1383
For a simple look at the raster, you can use plot. You
can see that each pixel has an associated mean temeperature value across
the US.
plot(temp2023)
Let’s say however, that we wanted to compare the temperature from the
1990’s to now. To do this, we will get data for each year from 1991-1995
and 2019-2023. You could do more years, but we are keeping it small
because it can be slow to work with large raster datasets in R. These
data sets are in your data folder. Again, we will use rast
to load them in.
tmean_90s <- rast('data/tmean_1991_1995.grd')
tmean_20s <- rast('data/tmean_2019_2023.grd')
Now, see what happens when you plot one of the files you just imported. You see that a raster stack contains many raster “layers” - here, one for each year.
plot(tmean_90s)
With rasters, you can perform mathematical functions on the rasters.
What this does is performs the math on each of the matching pixels. We
can get the average temperature for each set using
mean.
avg90s <- mean(tmean_90s)
avg20s <- mean(tmean_20s)
Again, what happens when we plot?
plot(avg90s)
Now, we can see the difference between the two groups by simply subtracting.
temp_change <- avg20s - avg90s
plot(temp_change)
What do you notice about the plot?
The last section showed the basics of rasters in R. But now, let’s
say you wanted to combine the raster information with the shape files
that you have. To plot the raster data, we will use another package
tidyterra which will allow us to combine the geom layers
from sf in ggplot2
To make this a simpler, we will crop the file down to just Wake County, which we are working with. It is very slow to plot maps like this with ggplot. We will go back the the county line file to get the bounds for cropping the raster. This can take a minute to run.
county <- st_read('../../data/Wake_County_Line-shp/')
## Reading layer `Wake_County_Line' from data source
## `/Users/maryglover/Documents/wpu-bio231/data/Wake_County_Line-shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 2001465 ymin: 644117.9 xmax: 2221231 ymax: 846796.2
## Projected CRS: NAD83 / North Carolina (ftUS)
county <- st_transform(county, crs = 4269)
wake_temp <- crop(avg20s, county)
wake_temp <- mask(wake_temp, county)
To plot this, we can then use the geom_spatraster from
the tidyterra package. Here, to show the temperature, we
have to set the fill to the column “mean”.
library(tidyterra)
##
## Attaching package: 'tidyterra'
## The following object is masked from 'package:stats':
##
## filter
ggplot()+
geom_spatraster(data = wake_temp, aes(fill = mean)) +
theme_minimal()
There are multiple packages that
theme_map colorRampPalette(rev(brewer.pal(11, “Spectral”)), space=“Lab”)
ggplot(maxent.df, aes(x=x, y=y)) +
geom_tile(aes(fill=suitability)) +
geom_polygon(data=map_data("state"),
aes(x=long, y=lat, group=group),
col="black",
fill='white', alpha=0) +
plot_area+
scale_y_continuous(expand=c(0,0))+
scale_x_continuous(expand=c(0,0)) +
scale_fill_gradientn(colors = myPalette(100), breaks=c(0,1), limits=c(0, 1))
Plot the streams, add points for each of the locations that were evaluated by the city of raleigh. Change the colors of the points.
plot the points from each stream on the map with the streams
The following information is not required for your exercise, but could be useful if you would like to get climate data in the future.
First, you have to download the raster files that we will need. To
get the files from the example, need need average temperature data for
1991-1995 and 2019-2023. We will download the yearly data for these
years. We have to make a folder to put all the prism files. I called
mine “prism-climate-data”. Then you can use
prism_set_dl_dir to indicate where downloaded files should
go. This may take a minute to run. The more file you download or the
more resolution, the longer it will take. For example, if you are trying
to get the daily data for multiple years, this could take hours. With
PRISM, you can download daily, monthly, or yearly data for average
temperature (tmean), minimum temperature (tmin), maximum temperature
(tmin), and precipitation (ppt). Use ?get_prism_annual for
more information and examples.
So here is the code to download the yearly average temperature
This only downloaded the files – we still need to load them into our
R workspace. To do this, we will first use
prism_archive_subset which will allow us to specify which
files we want to load into R. First, we want to get the ‘tmean’ files
for 1991-1995. This gives us a list of the files we want to load in.
Then we load the files and “stack” them using pd_stack
tmean90s_files<- prism_archive_subset('tmean', 'annual', years = 1991: 1995)
tmean90s <- pd_stack(tmean90s_files)
This gives you the raster stack that you can work with.